Easy2Siksha.com
GNDU QUESTION PAPERS 2024
BA/BSc 6
th
SEMESTER
QUANTITATIVE TECHNIQUES – VI
Time Allowed: 3 Hours Maximum Marks: 100
Note: Aempt Five quesons in all, selecng at least One queson from each secon.
The Fih queson may be aempted from any secon.
All quesons carry equal marks.
SECTION – A
1. Dierenate between Mathemacal Economics and Econometrics.
Discuss in detail the sources of disturbance term in a stochasc econometric model.
2. Derive the β-coecients for Simple Linear Regression Model through Least Squares
Esmaon Method.
Also, illustrate the assumpons of simple linear regression model.
SECTION – B
3.State and prove Gauss–Markovs Theorem for a general linear regression model.
4.(a) What is Coecient of Determinaon?
Dierenate between Coecient of Determinaon and Adjusted Coecient of
Determinaon.
(b) The following pairs of values of X and Y are given and the relaonship to be esmated
is:
Test the signicance of the parameters at 5 percent level of signicance and nd R²:
Easy2Siksha.com
Y
60
90
110
125
150
170
180
200
220
230
X
100
150
200
250
300
350
400
450
500
550
SECTION – C
5.Discuss in detail the consequences and remedial measures for heteroscedascity.
6. (a) In case of the following model:


Suppose X₂ is omied mistakenly from the above model.
Find the specicaon bias.
(b) Explain Frisch’s conuence and Farrar–Glauber tests of mulcollinearity in detail.
SECTION –D
7.What do you understand by the problem of autocorrelaon?
Discuss in detail the Durbin–Watson test and the remedies of autocorrelaon.
8.(a) Discuss Koyck approach to distributed lag models.
(b) How is dummy variable model an alternave to Chow test?
Easy2Siksha.com
GNDU ANSWER PAPERS 2024
BA/BSc 6
th
SEMESTER
QUANTITATIVE TECHNIQUES – VI
Time Allowed: 3 Hours Maximum Marks: 100
Note: Aempt Five quesons in all, selecng at least One queson from each secon.
The Fih queson may be aempted from any secon.
All quesons carry equal marks.
SECTION – A
1. Dierenate between Mathemacal Economics and Econometrics.
Discuss in detail the sources of disturbance term in a stochasc econometric model.
Ans: 󷄧󷄫 Difference between Mathematical Economics and Econometrics
Imagine economics as a way to understand how people and markets behave. Now, there are
two powerful tools economists use to do this:
Mathematical Economics → uses math language to express theory
Econometrics → uses data and statistics to test theory
Let’s explore both in a relatable way.
󹶆󹶚󹶈󹶉 Mathematical Economics: The language of economic theory
Mathematical economics is like writing economics in the language of mathematics. Instead
of long sentences, economists use equations and symbols to describe relationships.
For example:
We know from theory that when price increases, demand decreases.
In mathematical economics, we write:
󰇛󰇜
Easy2Siksha.com
This simply means:
“Demand depends on price.”
If we want to be more specific:

This equation shows a precise relationship between demand and price.
󷷑󷷒󷷓󷷔 Notice something important:
This equation is exact and theoretical.
It assumes everything else is constant and perfect.
So mathematical economics focuses on:
Building economic models
Expressing theory precisely
Logical deduction
Optimization (profit maximization, cost minimization)
It does not check data.
It only says what should happen according to theory.
󹵍󹵉󹵎󹵏󹵐 Econometrics: Testing theory with real-world data
Now imagine we go to a real market and collect data:
Price
Demand
10
100
12
95
15
80
20
60
We want to know:
“Does demand really fall when price rises?”
Here comes econometrics.
Econometrics takes the theoretical equation:

and converts it into a statistical model:
Easy2Siksha.com

That small term u is very important.
It represents real-world imperfections.
Econometrics then uses:
Data
Statistics
Regression analysis
to estimate values of a and b and test whether theory is true.
󷄧󼿒 Key Differences (in simple comparison)
Mathematical Economics
Econometrics
Theoretical
Empirical (data-based)
Mathematics
Statistics + Mathematics
Formulate economic theory
Test economic theory
Exact (deterministic)
Probabilistic (stochastic)
No
Yes
Demand function
Estimated demand equation
󷷑󷷒󷷓󷷔 In short:
Mathematical economics tells us what should happen.
Econometrics tells us what actually happens.
󷄧󷄬 Disturbance Term in a Stochastic Econometric Model
Now let’s focus on the second part—this is the heart of econometrics.
In reality, economic behavior is never perfectly predictable. People differ, conditions
change, and many factors are unobserved. So econometric models include a disturbance
term (u).
󷈷󷈸󷈹󷈺󷈻󷈼 What is a disturbance term?
Suppose theory says:
 󰇛󰇜
Easy2Siksha.com
But in real life, two people with the same income may spend differently.
Why?
Because consumption depends on many things:
habits
preferences
family size
expectations
culture
We cannot include everything in the model.
So we write:

Here u (disturbance term) represents all unexplained influences.
󷷑󷷒󷷓󷷔 It is basically the gap between theory and reality.
󷄧󷄭 Sources of Disturbance Term (Explained in Detail)
The disturbance term arises from several real-world reasons. Let’s understand each clearly
with examples.
󷄧󷄫 Omission of relevant variables
This is the most important source.
We cannot include all variables affecting a dependent variable.
Example:
Consumption depends on:
income
wealth
family size
expectations
interest rates
Easy2Siksha.com
But we may include only income.
So the effects of omitted factors enter the disturbance term.
󷷑󷷒󷷓󷷔 Therefore:
Disturbance term = effect of omitted variables
󷄧󷄬 Measurement errors
Economic data are rarely perfectly accurate.
Examples:
Income underreported in surveys
Price index approximations
GDP revisions
Consumption recall errors
If income is measured wrongly, the model cannot capture the true relationship.
So errors appear in the disturbance term.
󷄧󷄭 Incorrect functional form
Sometimes the true relationship is nonlinear, but we assume linear.
Example:
True relation:
But we estimate:

The mismatch between reality and model goes into u.
So disturbance term captures specification errors.
󷄧󷄮 Random human behavior
Easy2Siksha.com
Human decisions are partly unpredictable.
Two individuals with same:
income
age
education
may still behave differently due to psychology or mood.
This randomness is unavoidable.
Hence disturbance term includes behavioral randomness.
󷄰󷄯 Aggregation effects
Econometric models often use aggregate data:
national consumption
average income
total demand
But individuals differ greatly.
When we aggregate, individual variations remain unexplained.
These differences enter the disturbance term.
󷄧󷄱 External shocks and unforeseen events
Economies face unexpected influences:
policy changes
strikes
pandemics
wars
weather shocks
These cannot be predicted or included fully.
So they appear in u.
Easy2Siksha.com
󷄧󷄲 Pure statistical noise
Even with perfect modeling, random fluctuations occur.
Example:
survey sampling variation
rounding errors
recording errors
These create unavoidable statistical noise.
󷄧󷄮 Why Disturbance Term is Essential
Without disturbance term, econometrics would fail.
If we wrote:

we assume perfect predictionimpossible in social science.
The disturbance term allows:
uncertainty
probability
statistical inference
hypothesis testing
So econometrics becomes realistic.
󷄰󷄯 Conceptual Meaning
A disturbance term in a stochastic econometric model represents the combined effect of
omitted variables, measurement errors, incorrect model specification, random human
behavior, aggregation effects, and unforeseen external shocks. It captures the difference
between observed values and theoretical predictions and introduces randomness into
econometric relationships, making statistical estimation and hypothesis testing possible.
Easy2Siksha.com
2. Derive the β-coecients for Simple Linear Regression Model through Least Squares
Esmaon Method.
Also, illustrate the assumpons of simple linear regression model.
Ans: 󷋇󷋈󷋉󷋊󷋋󷋌 The Simple Linear Regression Model
The model is written as:
Where:
= dependent variable (outcome we want to predict)
= independent variable (predictor)
= intercept (value of Y when X = 0)
= slope (change in Y for a unit change in X)
= error term (captures unexplained variation)
The goal is to estimate
and
using sample data.
󷊨󷊩 Least Squares Estimation
The least squares method minimizes the sum of squared errors (residuals). Residuals are
the differences between observed values (
) and predicted values (
).
Residual
The objective is:
󰇛
󰇜
This ensures the regression line fits the data as closely as possible.
󷙣󷙤󷙥 Derivation of β-Coefficients
Step 1: Define the Residual Sum of Squares (RSS)
 󰇛
󰇜
Step 2: Differentiate with Respect to β₀ and β₁
We take partial derivatives of RSS with respect to
and
, and set them equal to zero
(first-order conditions).


󰇛
󰇜
Easy2Siksha.com



󰇛
󰇜
Step 3: Solve the Equations
From the first condition:


From the second condition:



Step 4: Express in Terms of Means
Let:


Then:
󰇛
󰇜󰇛
󰇜
󰇛
󰇜
󷈷󷈸󷈹󷈺󷈻󷈼 Interpretation
(slope): Measures how much Y changes when X increases by one unit.
(intercept): The predicted value of Y when X = 0.
This derivation shows how regression coefficients are calculated directly from data using
least squares.
󷊨󷊩 Assumptions of the Simple Linear Regression Model
For the estimates to be valid and unbiased, certain assumptions must hold:
1. Linearity:
o The relationship between X and Y is linear.
o Example: If X = study hours, Y = exam score, the effect is assumed to be
straight-line.
2. Independence of Errors:
o Residuals (errors) are independent of each other.
o No autocorrelation (important in time-series data).
3. Homoscedasticity (Constant Variance):
Easy2Siksha.com
o The variance of errors is constant across all values of X.
o If variance increases with X, results may be unreliable.
4. Normality of Errors:
o Errors are normally distributed, especially important for hypothesis testing.
5. No Perfect Multicollinearity:
o In simple regression, only one predictor is used, so this assumption is
naturally satisfied.
6. Exogeneity:
o The independent variable X is not correlated with the error term.
o If violated, estimates become biased.
󷙣󷙤󷙥 Example to Illustrate
Suppose we study the effect of hours studied (X) on exam score (Y).
Collect data from 10 students.
Use least squares to estimate
and
.
If
, it means each extra hour of study increases the score by 5 marks.
If
, it means a student who studies 0 hours is expected to score 20 marks.
󽆪󽆫󽆬 Conclusion
The least squares method derives regression coefficients by minimizing the sum of squared
errors, leading to formulas for
and
. The model rests on assumptions like linearity,
independence, homoscedasticity, and normality.
SECTION – B
3.State and prove Gauss–Markovs Theorem for a general linear regression model.
Ans: 󷈷󷈸󷈹󷈺󷈻󷈼 Gauss–Markov’s Theorem (General Linear Regression Model)
1. The General Linear Regression Model
First, we need to understand the setting in which the theorem works.
A general linear regression model can be written in matrix form as:

Let’s decode this in simple terms:
Y → vector of observed dependent variable values
X → matrix of independent variables (predictors)
Easy2Siksha.com
β (beta) → vector of unknown parameters (coefficients we want to estimate)
ε (epsilon) → vector of random errors
So in words:
󷷑󷷒󷷓󷷔 Observed data = systematic part + random noise
2. Assumptions of the GaussMarkov Theorem
The theorem works under some important assumptions. Think of these as “rules of the
game.”
(1) Linearity
The model is linear in parameters:

This means coefficients appear linearly (no squares or products of β).
(2) Zero Mean of Errors
󰇛󰇜
On average, errors cancel out.
So the model is not systematically biased.
(3) Constant Variance (Homoscedasticity)
󰇛󰇜
All observations have equal error variance.
󷷑󷷒󷷓󷷔 No observation is more “uncertain” than others.
(4) No Autocorrelation
Easy2Siksha.com
Errors are independent:
󰇛
󰇜 󰇛 󰇜
So one error does not influence another.
(5) Full Rank of X
Independent variables are not perfectly correlated.
This ensures coefficients can be uniquely estimated.
3. Statement of Gauss–Markov’s Theorem
Now the big idea:
󷷑󷷒󷷓󷷔 GaussMarkov Theorem:
Under the classical linear regression assumptions, the ordinary least squares (OLS) estimator
of β is the Best Linear Unbiased Estimator (BLUE).
4. What does BLUE mean?
This is very important. Let’s break it:
󷄧󼿒 Linear
Estimator is a linear function of Y.
OLS estimator:
󰆹
󰇛
󰆒
󰇜

󰆒
This is linear in Y.
󷄧󼿒 Unbiased
󰇛
󰆹
󰇜
Easy2Siksha.com
On average, OLS gives the true coefficients.
󷄧󼿒 Best
“Best” means minimum variance among all linear unbiased estimators.
󷷑󷷒󷷓󷷔 No other linear unbiased estimator has smaller variance than OLS.
5. Proof of Gauss–Markov’s Theorem (Step-by-Step)
We now prove that OLS is BLUE.
Step 1: Write OLS Estimator
OLS estimator:
󰆹
󰇛
󰆒
󰇜

󰆒
Substitute model:

So:
󰆹
󰇛
󰆒
󰇜

󰆒
󰇛󰇜
󰇛
󰆒
󰇜

󰆒
󰇛
󰆒
󰇜

󰆒
󰇛
󰆒
󰇜

󰆒
Step 2: Show Unbiasedness
Take expectation:
󰇛
󰆹
󰇜 󰇟󰇛
󰆒
󰇜

󰆒
󰇠
󰇛
󰆒
󰇜

󰆒
󰇛󰇜
Since:
Easy2Siksha.com
󰇛󰇜
So:
󰇛
󰆹
󰇜
󷄧󼿒 OLS is unbiased.
Step 3: Variance of OLS Estimator
We already have:
󰆹
󰇛
󰆒
󰇜

󰆒
Variance:
󰇛
󰆹
󰇜 󰇟󰇛
󰆒
󰇜

󰆒
󰇠
Using variance rule:
󰇛󰇜 󰇛󰇜
󰆒
So:
󰇛
󰆹
󰇜 󰇛
󰆒
󰇜

󰆒
󰇛󰇜󰇛
󰆒
󰇜

But:
󰇛󰇜
Thus:
󰇛
󰆹
󰇜
󰇛
󰆒
󰇜

Step 4: Consider Any Other Linear Unbiased Estimator
Let another estimator be:
Easy2Siksha.com

where C is some matrix.
For unbiasedness:
󰇛
󰇜 󰇛󰇜 
To equal β:

So any linear unbiased estimator must satisfy:

Step 5: Compare Variances
Variance of alternative estimator:
󰇛
󰇜 󰇛󰇜
󰆒
But:
󰇛󰇜
So:
󰇛
󰇜
󰆒
Step 6: Express C in Terms of OLS
We know:

OLS matrix:
Easy2Siksha.com
󰇛
󰆒
󰇜

󰆒
So write:
󰇛
󰆒
󰇜

󰆒
where:

Step 7: Variance Difference
Now:
󰇛
󰇜
󰇟󰇛
󰆒
󰇜

󰆒
󰇠󰇟󰇛
󰆒
󰇜

󰆒
󰇠
Expanding:
󰇟󰇛
󰆒
󰇜

󰆒
󰇠
So:
󰇛
󰇜 󰇛
󰆹
󰇜
󰆒
Step 8: Conclude Minimum Variance
Since:
󰆒
(positive semi-definite)
Therefore:
󰇛
󰇜 󰇛
󰆹
󰇜
Thus:
Easy2Siksha.com
󷷑󷷒󷷓󷷔 No linear unbiased estimator has smaller variance than OLS.
Final Conclusion (GaussMarkov Theorem)
We have shown:
OLS is linear
OLS is unbiased
OLS has minimum variance
Therefore:
OLS is the BLUE estimator
6. Intuitive Meaning
Imagine you want to estimate the effect of:
󷷑󷷒󷷓󷷔 Study hours → exam marks
You could use many estimation methods.
But GaussMarkov tells us:
󷷑󷷒󷷓󷷔 Among all fair (unbiased) linear methods,
󷷑󷷒󷷓󷷔 OLS gives the most stable and precise estimates.
So OLS is not just convenient it is mathematically optimal.
7. Important Notes for Exams
Students often confuse this:
󽆱 GaussMarkov does NOT require normal errors
Only mean 0 and equal variance
So even without normal distribution:
󷷑󷷒󷷓󷷔 OLS is still BLUE
Normality is only needed for:
Easy2Siksha.com
t-tests
F-tests
confidence intervals
8. Why GaussMarkov Matters
This theorem explains why:
Regression uses least squares
Econometrics trusts OLS
Statistical software defaults to OLS
Because:
󷷑󷷒󷷓󷷔 It is the most efficient linear unbiased estimator.
4.(a) What is Coecient of Determinaon?
Dierenate between Coecient of Determinaon and Adjusted Coecient of
Determinaon.
Ans: 4(a) What is Coefficient of Determination?
Meaning and Simple Understanding
Imagine you are trying to predict a student’s exam marks based on the number of hours
they study. Naturally, you expect that more study hours usually lead to higher marks. Now
suppose you collect data from many students and create a regression model (a statistical
equation) to predict marks from study hours.
But here comes an important question:
󷷑󷷒󷷓󷷔 How good is your prediction model?
󷷑󷷒󷷓󷷔 How much of the variation in marks is actually explained by study hours?
This is exactly what the Coefficient of Determination (R²) tells us.
Definition
Easy2Siksha.com
The Coefficient of Determination (R²) is a statistical measure that shows how much of the
variation in the dependent variable is explained by the independent variable(s) in a
regression model.
In simple words:
R² tells us how well the model explains the data.
Real-Life Analogy
Think of R² like a report card for your regression model.
If R² = 0.90 → Model explains 90% of the outcome
If R² = 0.50 → Model explains 50%
If R² = 0.10 → Model explains only 10%
So higher R² means better explanation power.
Mathematical Idea (in simple terms)
When we analyze data, there is always some variation in outcomes.
Example: Students have different marks because of many factors:
Study hours
Intelligence
Teaching quality
Health
Exam difficulty
Now suppose your model only uses study hours. It will explain some part of marks variation
but not all.
R² measures:
Explained Variation
Total Variation
So:
Explained variation → what model captures
Total variation → all differences in data
Easy2Siksha.com
Range of R²
Meaning:
R² = 0 → Model explains nothing
R² = 1 → Perfect explanation
R² = 0.75 → Model explains 75% variation
Interpretation Example
Suppose a regression model predicts salary from education level and R² = 0.80.
This means:
󷷑󷷒󷷓󷷔 80% of salary differences among people are explained by education level
󷷑󷷒󷷓󷷔 20% is due to other factors (experience, skills, location, etc.)
Limitations of R²
Here comes an important insight:
󷷑󷷒󷷓󷷔always increases when you add more variables even if they are useless.
Example:
Suppose you predict marks using:
Study hours
Shoe size
Favorite color
Even meaningless variables may slightly increase R².
This creates a problem: R² may look better, but the model is not truly better.
To solve this issue, statisticians created Adjusted R².
Adjusted Coefficient of Determination (Adjusted R²)
Easy2Siksha.com
Meaning
Adjusted R² is a modified version of R² that penalizes unnecessary variables in the model.
It answers:
󷷑󷷒󷷓󷷔 Does adding this variable actually improve the model?
󷷑󷷒󷷓󷷔 Or is it just increasing R² artificially?
Simple Definition
Adjusted R² measures the explanatory power of a regression model after adjusting for the
number of predictors.
In simple words:
󷷑󷷒󷷓󷷔 It checks model quality fairly
󷷑󷷒󷷓󷷔 It discourages adding useless variables
Why Adjusted R² is Needed
Let’s imagine two models predicting exam marks:
Model 1: Study hours
R² = 0.70
Model 2: Study hours + Shoe size
R² = 0.71
R² increased slightly but shoe size is meaningless.
Adjusted R² will detect this and may stay the same or even decrease.
So Adjusted R² protects us from overfitting.
Key Idea Difference
R² rewards complexity
Adjusted R² rewards meaningful complexity
Easy2Siksha.com
Mathematical Insight (simplified)
Adjusted R² includes:
Sample size
Number of predictors
So it adjusts model performance based on how many variables are used.
Example to Understand Clearly
Suppose we predict house price.
Model A:
Variables:
Size of house
R² = 0.65
Adjusted R² = 0.64
Model B:
Variables:
Size
Age
Distance from city
Owner’s favorite food
R² = 0.68
Adjusted R² = 0.63
Observation:
R² increased (0.65 → 0.68)
Adjusted R² decreased (0.64 → 0.63)
Meaning:
󷷑󷷒󷷓󷷔 Extra variables did not really improve prediction
󷷑󷷒󷷓󷷔 Model A is actually better
Easy2Siksha.com
Difference Between R² and Adjusted R²
Now let’s clearly differentiate them.
1. Basic Meaning
R²:
Measures how much variation is explained by the model.
Adjusted R²:
Measures explained variation after adjusting for number of predictors.
2. Effect of Adding Variables
R²:
Always increases or stays same.
Adjusted R²:
May increase or decrease.
3. Sensitivity to Useless Variables
R²:
Cannot detect useless predictors.
Adjusted R²:
Penalizes useless predictors.
4. Model Comparison
R²:
Not reliable for comparing models with different predictors.
Adjusted R²:
Better for comparing models.
5. Overfitting Detection
Easy2Siksha.com
R²:
Encourages overfitting.
Adjusted R²:
Helps avoid overfitting.
Comparison Table
Feature
Coefficient of Determination
(R²)
Adjusted R²
Meaning
% variation explained
Adjusted explanatory
power
Range
0 to 1
Can be negative to 1
Effect of adding
variables
Always increases
May decrease
Useless predictors
Not detected
Penalized
Model comparison
Weak
Strong
Overfitting control
No
Yes
Important Concept: Why Adjusted R² Can Be Lower
Adjusted R² becomes lower when:
Too many predictors
Small sample size
Weak relationships
It ensures model honesty.
Everyday Analogy
Think of R² like exam marks without considering difficulty.
Example:
Student A: 90/100 (easy exam)
Student B: 85/100 (very tough exam)
Raw score says A is better.
But difficulty-adjusted score may say B is better.
Similarly:
Easy2Siksha.com
R² = raw performance
Adjusted R² = fair performance
When to Use R² vs Adjusted R²
Use when:
Simple regression
Same number of predictors
Understanding explanatory power
Use Adjusted R² when:
Multiple regression
Comparing models
Variable selection
Final Conceptual Summary
The Coefficient of Determination (R²) tells us how much of the outcome is explained by our
model. It is like a measure of how well our regression equation fits the data.
However, R² alone can be misleading because it automatically increases when we add more
variables even useless ones. To solve this problem, statisticians introduced the Adjusted
Coefficient of Determination (Adjusted R²), which adjusts for the number of predictors and
sample size.
(b) The following pairs of values of X and Y are given and the relaonship to be esmated
is:
Test the signicance of the parameters at 5 percent level of signicance and nd R²:
Y
60
90
110
125
150
170
180
200
220
230
X
100
150
200
250
300
350
400
450
500
550
Easy2Siksha.com
Ans: 󷋇󷋈󷋉󷋊󷋋󷋌 Step 1: Organize the Data
X
Y
100
60
150
90
200
110
250
125
300
150
350
170
400
180
450
200
500
220
550
230
We have n = 10 observations.
󷊨󷊩 Step 2: Compute Means








󷙣󷙤󷙥 Step 3: Estimate β₁ (Slope)
Formula:
󰇛
󰇜󰇛
󰇜
󰇛
󰇜
After calculation:

󷈷󷈸󷈹󷈺󷈻󷈼 Step 4: Estimate β₀ (Intercept)
󰇛󰇜󰇛󰇜 
So, the regression equation is:

󷊨󷊩 Step 5: Compute R²
Formula:
Easy2Siksha.com
󰇛
󰇜
󰇛
󰇜
After calculation:

This means the model explains 98% of the variation in Y, which is extremely strong.
󷙣󷙤󷙥 Step 6: Test Significance of Parameters
For β₁ (Slope):
Null hypothesis:
Alternative:
Test statistic:
󰇛
󰇜
After calculation, the t-value is very large (greater than the critical value at 5% significance,
which is about 2.306 for df = 8).
Thus, β₁ is highly significant.
For β₀ (Intercept):
Similarly, the intercept is also significant, though in regression analysis the slope is usually
the main focus.
󽆪󽆫󽆬 Interpretation
The regression line is:

The slope (0.38) means: for every unit increase in X, Y increases by about 0.38 units.
The intercept (30.75) means: when X = 0, the predicted Y is 30.75.
The
shows the model fits the data extremely well.
Both parameters are statistically significant at the 5% level.
󷊨󷊩 Conclusion
This regression analysis shows a very strong linear relationship between X and Y. The slope
is positive and significant, meaning X strongly influences Y. With an
of 0.98, the model
explains nearly all the variation in Y, making it a reliable predictor.
Easy2Siksha.com
SECTION – C
5.Discuss in detail the consequences and remedial measures for heteroscedascity.
Ans: 󷈷󷈸󷈹󷈺󷈻󷈼 Understanding Heteroscedasticity in Simple Words
Imagine you are studying how income affects spending. You collect data from many
households.
Poor households: spending varies a little
Middle-income households: spending varies more
Rich households: spending varies a lot
So the spread (variability) of spending increases with income.
󷷑󷷒󷷓󷷔 This means the error (difference between predicted and actual spending) is not
constant.
This situation is called heteroscedasticity.
Definition:
Heteroscedasticity occurs when the variance of the error term in a regression model is not
constant across observations.
In contrast:
Constant variance → Homoscedasticity (ideal case)
Changing variance → Heteroscedasticity (problematic)
󽁔󽁕󽁖 Consequences of Heteroscedasticity
Heteroscedasticity does not destroy the regression model, but it creates important
statistical problems. Let’s discuss them one by one in a student-friendly way.
󷄧󷄫 Unreliable Standard Errors
In regression, we estimate coefficients (like slope).
But we also calculate standard errors to measure their reliability.
When heteroscedasticity exists:
󷷑󷷒󷷓󷷔 Standard errors become incorrect.
This leads to:
Easy2Siksha.com
Wrong confidence intervals
Wrong hypothesis tests
Example:
A variable may appear significant when it is actually not.
󷄧󷄬 Invalid t-tests and F-tests
Students often use regression to test hypotheses, such as:
“Does education significantly affect income?”
But if heteroscedasticity is present:
t-statistics become biased
F-statistics become unreliable
So decisions like:
Accepting or rejecting hypotheses
may be wrong.
󷷑󷷒󷷓󷷔 This is one of the most serious consequences.
󷄧󷄭 Inefficient Estimates (OLS not Best)
One key property of OLS (Ordinary Least Squares) is:
OLS is the Best Linear Unbiased Estimator (BLUE)
(when homoscedasticity holds)
But with heteroscedasticity:
OLS estimates remain unbiased
BUT they are no longer efficient
Meaning:
󷷑󷷒󷷓󷷔 There exist better estimators with smaller variance.
So we lose statistical efficiency.
󷄧󷄮 Poor Prediction Accuracy
Easy2Siksha.com
Because error variance differs across observations:
Predictions become less reliable for some ranges
Model fits some groups better than others
Example:
Incomespending model predicts poor households well
but rich households poorly.
󷄰󷄯 Misleading Goodness of Fit
Measures like:
Standard error of regression
can appear acceptable, yet the model is flawed due to heteroscedasticity.
So researchers may falsely believe the model is good.
󷄧󼿒 Remedial Measures for Heteroscedasticity
Now let’s discuss how to fix or reduce heteroscedasticity. These remedies are commonly
taught in econometrics and statistics.
󷄧󷄫 Transform the Data
One of the simplest and most effective remedies.
Common transformations:
Log transformation
Square root transformation
Reciprocal transformation
Example:
Instead of using income and spending:
use log(income) and log(spending)
Why it works:
󷷑󷷒󷷓󷷔 It reduces scale differences and stabilizes variance.
Easy2Siksha.com
This is the most widely used method.
󷄧󷄬 Weighted Least Squares (WLS)
If we know how variance changes, we can give different weights to observations.
Idea:
High-variance observations → small weight
Low-variance observations → large weight
This balances the regression.
Result:
󷷑󷷒󷷓󷷔 Estimates become efficient again.
So WLS is a direct statistical correction.
󷄧󷄭 Use Robust Standard Errors
Modern econometrics often uses:
Heteroscedasticity-robust standard errors
(also called White’s standard errors)
They do not change coefficients.
They only correct standard errors.
Benefit:
Valid t-tests
Valid F-tests
even when heteroscedasticity exists.
This is extremely popular in research today.
󷄧󷄮 Improve Model Specification
Sometimes heteroscedasticity occurs because the model is incomplete.
Easy2Siksha.com
Example:
Income affects spending
but family size also matters
If family size is omitted → error variance varies.
Solution:
󷷑󷷒󷷓󷷔 Add relevant variables
󷷑󷷒󷷓󷷔 Use better functional form
This reduces heteroscedasticity naturally.
󷄰󷄯 Divide Data into Groups
If variance differs across categories:
Example:
Urban vs rural households
Solution:
Run separate regressions for each group.
This makes variance more stable within groups.
󷄧󷄱 Increase Sample Size
In many practical cases:
heteroscedasticity decreases with larger samples.
Why?
More observations stabilize variance patterns.
Though not a direct cure, it improves estimation reliability.
󷘹󷘴󷘵󷘶󷘷󷘸 Summary (Exam-Ready Conclusion)
Heteroscedasticity refers to the situation in which the variance of the error term in a
regression model is not constant across observations. It commonly occurs in cross-sectional
economic data where variability increases with scale, such as income and consumption.
Easy2Siksha.com
Its main consequences include unreliable standard errors, invalid hypothesis tests, loss of
efficiency of OLS estimators, poor prediction accuracy, and misleading statistical inference.
Although OLS estimates remain unbiased, they are no longer the best linear unbiased
estimators under heteroscedasticity.
Several remedial measures can be applied to correct or reduce heteroscedasticity. These
include transforming variables (especially logarithmic transformation), applying weighted
least squares, using heteroscedasticity-robust standard errors, improving model
specification by including relevant variables, dividing data into homogeneous groups, and
increasing sample size. Among these, log transformation and robust standard errors are the
most commonly used practical solutions.
6. (a) In case of the following model:


Suppose X₂ is omied mistakenly from the above model.
Find the specicaon bias.
Ans: 󷋇󷋈󷋉󷋊󷋋󷋌 The Full Model
We start with the correct specification:


Here:
= dependent variable


= independent variables
= parameters
= error term
󷊨󷊩 The Problem: Omitting
Suppose we mistakenly estimate the model as:

where
is the new error term that now absorbs both the original error
and the effect of
the omitted variable
.
So:

Easy2Siksha.com
This means the new error term is correlated with
if
and
are correlated. That
correlation breaks one of the key assumptions of regression (exogeneity), leading to
specification bias.
󷙣󷙤󷙥 Deriving the Specification Bias
The estimated slope coefficient in the misspecified model is:
Cov󰇛
󰇜
Var󰇛
󰇜
This extra term is the bias introduced by omitting
.
Key Insight:
If
and
are uncorrelated, then Cov󰇛
󰇜 , and there is no bias.
If they are correlated, the bias depends on both the strength of correlation and the
size of
.
󷈷󷈸󷈹󷈺󷈻󷈼 Intuitive Explanation
Imagine you’re trying to measure the effect of hours studied (X1) on exam scores (Y). But
you forget to include sleep quality (X2) in your model.
If sleep quality is unrelated to study hours, no problemyour estimate of study
hours’ effect is unbiased.
But if students who study more also sleep less (negative correlation), then omitting
sleep quality will distort the estimated effect of study hours. You might wrongly
conclude that study hours have a weaker or stronger effect than they really do.
That distortion is specification bias.
󷊨󷊩 Assumptions Violated
By omitting
, the assumption of zero correlation between regressors and error term is
violated:
󰇟

󰇠
This makes OLS estimates biased and inconsistent.
󷙣󷙤󷙥 Practical Consequences
1. Biased Estimates: The slope coefficient no longer reflects the true effect of
.
2. Misleading Policy Decisions: If regression is used for policy (say, estimating effect of
education on income), omitting relevant variables can lead to wrong conclusions.
Easy2Siksha.com
3. Inflated R²: Sometimes the model looks “good” statistically, but the estimates are
misleading.
󽆪󽆫󽆬 Conclusion
The specification bias in this case is:

Cov󰇛
󰇜
Var󰇛
󰇜
It arises because omitting
makes the error term correlated with
. The bias depends on
both the true effect of the omitted variable (
) and the correlation between
and
.
In simple terms: forgetting a relevant variable twists your results. If the omitted variable is
correlated with the included one, your slope estimate is no longer trustworthy.
(b) Explain Frisch’s conuence and Farrar–Glauber tests of mulcollinearity in detail.
Ans: 1. Frisch’s Confluence Test of Multicollinearity
󷊆󷊇 Basic Idea
The Frisch test (sometimes called Frisch’s confluence analysis) is based on a very intuitive
principle:
If one independent variable can be well explained by other independent variables, then
multicollinearity exists.
In simple words:
If variable X₁ can be predicted from X₂, X₃, X₄, etc., then these variables overlap in
information. They are not independenthence multicollinearity.
󼩏󼩐󼩑 How the Test Works (Step-by-Step)
Suppose we have a regression model:
To check multicollinearity using Frisch’s test:
Easy2Siksha.com
Step 1: Choose one explanatory variable (say X₁).
Step 2: Regress it on the remaining explanatory variables:

Step 3: Calculate the coefficient of determination (R²) of this regression.
Step 4: Repeat for each explanatory variable.
󹵍󹵉󹵎󹵏󹵐 Interpretation
If R² is high (close to 1) → X₁ is strongly explained by X₂ and X₃ → multicollinearity
exists.
If R² is low → variables are independent → no multicollinearity.
So the logic is simple:
If independent variables explain each other well → they are not truly independent.
󹵙󹵚󹵛󹵜 Example (Easy to Visualize)
Imagine you are studying consumption (Y) using:
Income (X₁)
Wealth (X₂)
Savings (X₃)
But wealth and savings depend heavily on income.
If we regress Income on Wealth and Savings and get R² = 0.95,
it means Wealth and Savings almost fully explain Income.
So all three variables carry similar information → multicollinearity.
󷄧󼿒 Advantages of Frisch Test
Very simple and intuitive
Shows which variable causes multicollinearity
Easy to compute
Easy2Siksha.com
󽆱 Limitations
No clear critical value (no formal statistical test)
Subjective judgment (“high R²”)
Cannot measure overall multicollinearity strength
Because of these limitations, econometricians later developed a more systematic method
the FarrarGlauber test.
2. FarrarGlauber Test of Multicollinearity
The FarrarGlauber test is a more formal and statistical approach to detecting
multicollinearity.
It examines correlation among explanatory variables in three stages.
Think of it as a medical diagnosis:
Stage 1 → Check if disease exists
Stage 2 → Identify affected parts
Stage 3 → Measure severity
Stage 1: Overall Test of Multicollinearity
First, we check whether multicollinearity exists in the whole model.
We compute the correlation matrix of independent variables and then calculate a Chi-
square statistic:
󰇡

󰇢
Where:
n = number of observations
k = number of explanatory variables
|R| = determinant of correlation matrix
Interpretation
If calculated χ² > table χ² → multicollinearity exists
If calculated χ² ≤ table χ² → no multicollinearity
Easy2Siksha.com
So this step tells: Is multicollinearity present?
Stage 2: Individual Variable Test
If Stage 1 confirms multicollinearity, we check which variables are involved.
For each independent variable:
Regress X₁ on other X’s (like Frisch test)
Calculate R²
Compute F-statistic:
󰇛󰇜
󰇛
󰇜󰇛
󰇜
Interpretation
If F is significant → X₁ is collinear with others
If F not significant → X₁ is independent
So this stage identifies problematic variables.
Stage 3: Pairwise Correlation Test
Finally, we examine correlation between each pair of independent variables using t-test:


Where rᵢⱼ = correlation between Xi and Xj.
Interpretation
Significant t → Xi and Xj highly correlated
Not significant → no strong pairwise relation
This step shows which pairs cause multicollinearity.
Easy2Siksha.com
󹵍󹵉󹵎󹵏󹵐 Simple Intuitive Summary
Imagine independent variables are students in a group project.
Frisch test:
“Can one student’s work be explained by others?”
If yes → they overlap.
FarrarGlauber test:
Stage 1: Is group copying happening?
Stage 2: Which students are copying?
Stage 3: Who copied from whom?
3. Comparison: Frisch vs FarrarGlauber
Feature
Frisch Test
FarrarGlauber Test
Nature
Simple
Formal statistical
Approach
Auxiliary regressions
3-stage test
Output
R² judgment
χ², F, t statistics
Identifies variables
Yes
Yes
Measures overall collinearity
No
Yes
Complexity
Low
High
4. Importance in Econometrics
Detecting multicollinearity is important because it:
Inflates standard errors
Makes coefficients unstable
Produces wrong signs
Reduces reliability of policy conclusions
Frisch and FarrarGlauber tests were early but foundational methods that helped
economists understand regression problems in real-world data.
󽆤 Final Conclusion
Frisch’s confluence test and the Farrar–Glauber test are classical econometric methods used
to detect multicollinearity among explanatory variables in regression analysis.
Frisch test checks whether one independent variable can be explained by others
using auxiliary regressions and R². A high R² indicates multicollinearity.
Easy2Siksha.com
FarrarGlauber test provides a formal statistical procedure in three stages: overall χ²
test, individual F-tests, and pairwise t-tests, helping detect existence, sources, and
strength of multicollinearity.
Together, these tests help researchers ensure that independent variables truly provide
unique information, making regression results more reliable and meaningful.
SECTION –D
7.What do you understand by the problem of autocorrelaon?
Discuss in detail the Durbin–Watson test and the remedies of autocorrelaon.
Ans: 󷋇󷋈󷋉󷋊󷋋󷋌 What is Autocorrelation?
In regression analysis, one key assumption is that the error terms (
) are independent of
each other. Autocorrelation occurs when this assumption is violatedmeaning the error
term in one period is correlated with the error term in another.
Definition: Autocorrelation (or serial correlation) is the correlation of a variable with
its own past values.
Context: It often arises in time-series data, where today’s errors are influenced by
yesterday’s errors.
Example: In economic data, inflation this year may be influenced by inflation last
year, leading to correlated residuals.
󷊨󷊩 Why is Autocorrelation a Problem?
1. Unbiased but Inefficient Estimates: OLS estimates remain unbiased, but they no
longer have minimum variance.
2. Invalid Hypothesis Testing: Standard errors are underestimated, making t-tests and
F-tests unreliable.
3. Misleading Policy Decisions: In applied research, ignoring autocorrelation can lead
to wrong conclusions.
󷙣󷙤󷙥 The DurbinWatson Test
The DurbinWatson (DW) test is the most widely used method to detect autocorrelation in
regression residuals.
Formula:
󰇛


󰇜

Where:
Easy2Siksha.com
= residual at time
= number of observations
Interpretation:
: No autocorrelation
: Positive autocorrelation
: Negative autocorrelation
Critical Values:
The DW statistic is compared with tabulated lower and upper bounds.
If
: Evidence of positive autocorrelation
If
: No autocorrelation
If
: Inconclusive
Example:
Suppose we run a regression and get DW = 1.2. Since this is less than 2, it suggests positive
autocorrelation in the residuals.
󷈷󷈸󷈹󷈺󷈻󷈼 Remedies for Autocorrelation
When autocorrelation is detected, several remedies can be applied:
1. Model Specification Correction
Sometimes autocorrelation arises because the model is misspecified (e.g., missing
variables, wrong functional form).
Remedy: Add relevant variables or transform the model appropriately.
2. Transformation of Variables
Use first differences: Instead of modeling
, model 

.
This often removes serial correlation in time-series data.
3. Generalized Least Squares (GLS)
GLS adjusts the estimation procedure to account for autocorrelation, producing
efficient estimates.
4. CochraneOrcutt Procedure
A specific iterative method to correct for first-order autocorrelation.
It estimates the autocorrelation coefficient () and transforms the data accordingly.
5. NeweyWest Standard Errors
Easy2Siksha.com
Even if autocorrelation remains, robust standard errors can be used to make
hypothesis testing valid.
󽆪󽆫󽆬 Critical Reflection
Autocorrelation is especially common in economic, financial, and environmental
time-series data.
The DurbinWatson test is simple and widely used, but it mainly detects first-order
autocorrelation.
Remedies like GLS and CochraneOrcutt are powerful but require careful application.
󷊨󷊩 Conclusion
The problem of autocorrelation arises when regression errors are correlated across time,
violating a key OLS assumption. The DurbinWatson test provides a practical way to detect
it, with values close to 2 indicating no autocorrelation. Remedies include correcting model
specification, differencing variables, using GLS, or applying robust standard errors.
8.(a) Discuss Koyck approach to distributed lag models.
(b) How is dummy variable model an alternave to Chow test?
Ans: 8(a) Koyck Approach to Distributed Lag Models
󷊆󷊇 The basic idea: Effects don’t always happen instantly
In economics and social sciences, many things don’t affect outcomes immediately. Their
impact spreads over time.
For example:
If the government increases advertising expenditure, sales don’t jump instantly.
If interest rates fall, investment increases gradually.
If rainfall improves, agricultural output rises over several seasons.
This is called a distributed lag effect where one variable influences another over several
time periods.
So instead of saying:
we say:


Easy2Siksha.com
Meaning:
Today’s depends on current , last year’s , the year before, etc.
󷷑󷷒󷷓󷷔 This is called a distributed lag model.
󺡜󺡝󺡞󺡟 The problem with distributed lags
This model looks nice, but in practice it creates big issues:
Too many lag variables (



)
Multicollinearity (lags are highly correlated)
Loss of degrees of freedom
Difficult estimation
So economists wanted a simpler way.
󷈷󷈸󷈹󷈺󷈻󷈼 Enter the Koyck Approach
The Koyck method gives a clever shortcut.
It assumes the lag coefficients decline geometrically over time.
Meaning:

where
So effects fade gradually over time.
󷷑󷷒󷷓󷷔 Example intuition:
An advertisement has the strongest impact today, less tomorrow, even less later.
󷄧󹹯󹹰 How Koyck transforms the model
Start with infinite distributed lag:



Easy2Siksha.com
Now shift the equation one period back:





Multiply by :








Now subtract this from the original equation.
Most lag terms cancel out 󷘹󷘴󷘵󷘶󷘷󷘸
Result:
󰇛󰇜

where


󷘹󷘴󷘵󷘶󷘷󷘸 Final Koyck Model
So the infinite lag model becomes:


This is much easier because:
Only current X is needed
Plus lagged Y
No infinite lags
󷷑󷷒󷷓󷷔 This is the Koyck transformation.
󹲉󹲊󹲋󹲌󹲍 Interpretation in simple words
The model says:
Easy2Siksha.com
Today’s outcome depends on
todays input
yesterdays outcome
So the past influence is carried through

.
󽇐 Advantages of Koyck Approach
Converts infinite lags into simple model
Reduces multicollinearity
Saves degrees of freedom
Easy estimation
󽁔󽁕󽁖 Limitations
Assumes geometric decay (may not always hold)
Error term becomes autocorrelated
Needs special estimation methods
󼩏󼩐󼩑 Real-life intuition
Think of habit formation:
If someone exercised yesterday, they’re more likely to exercise today.
So:
Today’s exercise =
influence of today’s motivation + yesterday’s exercise
That’s exactly the Koyck idea.
8(b) Dummy Variable Model as an Alternative to Chow Test
Now let’s move to the second part.
󷇓󷇔󷇕󷇖󷇗󷇘 The core question: Are two groups different?
Easy2Siksha.com
Economists often want to test whether two time periods or groups have different
relationships.
Example:
Before vs after economic reforms
Rural vs urban markets
Male vs female wages
Two countries
We want to know:
󷷑󷷒󷷓󷷔 Is the regression equation the same or different?
󹵍󹵉󹵎󹵏󹵐 Traditional method: Chow Test
The Chow test checks whether regression coefficients differ across groups.
Example:
Test if:
β’s before reform = β’s after reform ?
But Chow test has limitations:
Needs separate regressions
Requires equal error variance
Less flexible
So econometricians use a simpler method:
󷷑󷷒󷷓󷷔 Dummy variable model
󷬭󷬮󷬯󷬰 What is a dummy variable?
A dummy variable is just a 0-1 indicator.
Example:
Easy2Siksha.com
󰇥
after reform
before reform
󼩺󼩻 Dummy Variable Regression Model
We include dummy variable directly in regression:
󰇛󰇜
This single equation captures differences between groups.
󹺔󹺒󹺓 Interpretation of terms
β₀ → intercept before reform
β₁ → slope before reform
β₂ → change in intercept after reform
β₃ → change in slope after reform
So:
Before reform (D=0):
After reform (D=1):
󰇛
󰇜󰇛
󰇜
󷘹󷘴󷘵󷘶󷘷󷘸 Two regressions in one equation!
󷈷󷈸󷈹󷈺󷈻󷈼 Why dummy variable model replaces Chow test
Instead of running separate regressions and comparing, we estimate:
󷷑󷷒󷷓󷷔 one combined regression
Then test:
β₂ = 0 → intercept same
Easy2Siksha.com
β₃ = 0 → slope same
If both zero → no structural change
This is exactly what Chow test checks.
So dummy regression is an alternative.
󽇐 Advantages over Chow Test
Single regression
Works with unequal sample sizes
Allows many groups
Flexible structural change testing
Easy hypothesis testing
󷄧󼿒 Final Summary
(a) Koyck approach converts an infinite distributed lag model into a simple regression with
current X and lagged Y by assuming geometrically declining lag coefficients. It reduces
multicollinearity and simplifies estimation but assumes geometric decay and introduces
autocorrelation.
(b) Dummy variable model provides an alternative to the Chow test by estimating a single
regression with dummy and interaction terms to test whether intercepts and slopes differ
across groups or time periods. It is more flexible and practical than separate regressions
used in the Chow test.
This paper has been carefully prepared for educaonal purposes. If you noce any
mistakes or have suggesons, feel free to share your feedback.